Discourse Annotation Working Group Report
نویسندگان
چکیده
The classical “success story” of corpus annotation are the various syntax treebanks that provide structural analyses of sentences and have enabled researchers to develop a range of new and highly successful data-oriented approaches to sentence parsing. In recent years, however, a number of corpora have been constructed that provide annotations on the discourse level, i.e. information that reaches beyond the sentence boundaries. Phenomena that have been annotated include coreference links, the scope of connectives, and coherence relations. Many of these are phenomena on whose handling there is not a general agreement in the research community, and therefore the question of “recycling” corpora by other people and for other purposes is often difficult. (To some extent, this is due to the fact that discourse annotation deals “only” with surface reflections of underlying, abstract objects.) At the same time, the efforts needed for building high-quality discourse corpora are considerable, and thus one should be careful in deciding how to invest those efforts. One aspect of providing added-value with annotation projects is that of shared corpora: If a variety of annotation efforts is executed on the same primary data, the series of annotation levels can yield insights that the creators of the individual levels had not explicitly planned for. A clear case is the relationship between coherence relations and connective use: When both levels are marked individually and with independent annotation guidelines, then afterwards the correlations between coherence relations, cue usage (and possibly other factors, if annotated) can be studied systematically. This conception of multi-level annotation presupposes, of course, that the technical problems of setting annotation levels in correspondence to one another be resolved. The panel on discourse annotation is organized by Manfred Stede and Janyce Wiebe. It aims at surveying the scene of discourse corpora, exploring chances for synergy, and identifying desiderata for future corpus creation projects. In preparation for the panel, the participants have provided the following short descriptions of the various copora in whose construction they have been involved.
منابع مشابه
Evaluation Of Annotation Schemes For Japanese Discourse Japanese Discourse Tagging Working Group
This paper describes standardizing discourse annotation schemes for Japanese and evaluates the reliability of these schemes. We propose three schemes, that is, utterance unit, discourse segment and discourse markers. These schemes have shown to be incrementally improved based on the experimental results, and the reliability of these schemes are estimated as "good" range.
متن کاملThe Penn Discourse Treebank 2.0 Annotation Manual
This report contains the guidelines for the annotation of discourse relations in the Penn Discourse Treebank (http://www.seas.upenn.edu/~pdtb), PDTB. Discourse relations in the PDTB are annotated in a bottom up fashion, and capture both lexically realized relations as well as implicit relations. Guidelines in this report are provided for all aspects of the annotation, including annotation expli...
متن کاملThe Annotation Scheme of the Turkish Discourse Bank and an Evaluation of Inconsistent Annotations
In this paper, we report on the annotation procedures we developed for annotating the Turkish Discourse Bank (TDB), an effort that extends the Penn Discourse Tree Bank (PDTB) annotation style by using it for annotating Turkish discourse. After a brief introduction to the TDB, we describe the annotation cycle and the annotation scheme we developed, defining which parts of the scheme are an exten...
متن کاملAnnotation of Discourse Connectives for the Prague Dependency Treebank
The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...
متن کاملPDTB-style Discourse Annotation of Chinese Text
We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken toge...
متن کامل